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Perception of Consonant Clusters and Variable Gap Time* 

Mike Cahill 

cahill@ling.ohio-state.edu 

Abstract: In every case in which measurements of labial-velar stops 
[kp, gb] have been made, it has been found that the labial and velar 
gestures are not strictly simultaneous, but rather that_the velar gesture 
slightly precedes the labial one (thus [Kp] and not [pk]). Onej>ossible 
explanation for this is that [Kp] is more perceptually salient than [pk]. This 
paper reports an attempt to test this hypothesis by observing listeners’ 
identifications of [apka] and [akpa] with variable gap times inserted 
between the consonantal onset and release. The results showed that [apka] 
was more readily identified than [akpa], effectively showing that perceptual 
salience cannot be invoked to explain the ordering of velar and labial 
gestures in labial-velar stops. 

INTRODUCTION 

Labial-velar stops [kp], [gb] occur in many languages from central and west 
Africa, where the bulk of them are found, as well as in a handful of languages in and 
around Papua New Guinea. 1 They are commonly described as having “simultaneous” 
closure at the labial and velar places of articulation. However, most transcriptions have 
recorded them as [Kp] and not [pk], and this is no accident. Spectrographic evidence 
shows that a vowel preceding a labial-velar stop makes a transition into a velar 
component, and the release of the consonant has labial characteristics, in languages as 
diverse as Dedua (from Papua New Guinea) and Efik (Ladefoged & Maddieson 1996) and 
Ibibio (Connell 1994), both from west Africa. Also, Maddieson has presented direct 
evidence that in Ewe, at least, the labial gesture both starts and ends later than the velar 
gesture, as in Figure 1 below, taken from electromagnetic articulography data in 
Maddieson (1993) . 



* My very great thanks to Keith Johnson, who provided invaluable aid in the design and 
implementation of every stage of the experiment reported herein. 

1 A few Creole languages of the Caribbean, such as Ndyuka of Surinam, also have labial- 
velar stops, presumably as a result of African language substrata (Huttar & Huttar 1994). 
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Fig. 1 Coordination of lower lip and tongue back movements 
in the Ewe word akpa. Y-axis is vertical displacement; 
horizontal lines indicate the likely duration of actual 
contact of the articulator. (Maddieson 1993) 

In this paper, then, [£p] and [pk] will both refer to articulations that are mostly 
overlapping, but in [lcp] the labial gesture follows the velar one, as above, while in [pk], 
the labial gesture would precede the velar one. I will focus on the voiceless stop [tp] here, 
though the discussion is also applicable to the voiced labial-velar stop [gb]. 

One of the questions arising out of research on labial-velar stops is why this partial 
or incomplete overlap should exist, rather than total simultaneity. Also, why should it be 
that universally (as far as we know) there is an asymmetry of gestural overlap and that this 
asymmetry should always be in the same direction? 

At least three possibilities exist and are worthy of consideration. One possibility is 
that there is an “ease of articulation” factor, that is, that [£p] requires less effort to 
produce than [pk]. One might argue that with the condyles of the jaw acting as a pivot, 
especially if a consonant is pronounced after a vowel, it would be more natural for the 
articulators closer to the pivot point to make contact sooner than those further away; so a 
consonant made with a place of articulation further back in the mouth would be more 
likely to precede a consonant made in the front of the mouth. This could be a physiological 
explanation of the data in Hume (1996), who gives examples from several languages in 
which metathesis of consonant clusters operates to give an output in which the more 
posterior consonant precedes the more anterior one. She proposes a phonological 
constraint in which the more posterior of a consonant cluster pair is favored to precede the 
other. Arguing from the “ease of articulation” viewpoint, however, is notoriously suspect, 
since languages of the world abound in sounds which are not simple to produce. Sounds 
such as implosives, clicks, ejectives, and complex consonant clusters come to mind. A 
perusal of Ladefoged & Maddieson (1996) yields an abundance of examples. One may be 
able to argue persuasively and theoretically that certain sounds are, in fact, more difficult 
to make than others, but the existence of difficult sounds in languages of the the world 
makes this argument a tendency rather than a robust explanation for the phenomenon. 
Also, what is judged as “difficult” largely depends on the inventory of sounds in the 
speaker’s native language compared to the language under consideration. I judge the [kp] 
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found in most Ghanaian languages to be fairly difficult, while the Ghanaian takes the [£p] 
in stride but judges the phonetics of American English ‘squirrel’ to be very difficult. 

Another possibility is that since the historical development of labial-velars seems 
commonly, perhaps always, to be the reflex of a labialized stop, this labial release has been 
maintained in modem languages. The two main sources of /kp/ historically seem to be 
sometimes *pw, exemplified by Aghem (Hyman 1979), but more often *kw, exemplified 
by the Sawabantu group of languages in western Cameroon, (Mutaka and Ebobisse 1996) 
These and other examples are examined in Cahill (in prep). In both of these proto -forms, 
the release is labial, whereas the start of the consonant, at least in the case of *kw, is not 
The possibility is that the asymmetry present in the proto-form, that is, the labiality being 
skewed more to the release, is preserved in the synchronic reflexes. _ 

A third possibility is that, perceptually, a [fcp] is more salient than a [pk]. This 
implies that [£p] is easier to perceive in some way than a [pk]. 2 Again, Hume’s constraint 
favoring consonant clusters with the more posterior consonant occurring first would be a 
way of expressing this tendency in phonological terms. But, as with the “ease of 
articulation” possibility, languages abound which have hard-to-hear sounds. These would 
include creaky and breathy vowels, different fricatives, labial-velars themselves, and a host 
of others. Similarly to the point made for difficult articulations, sounds which are judged 
“hard to perceive” are mainly those which do not occur in the hearer’s native language. 

It is of course possible that more than one of the above factors could be at work 
here. For example, [£p] could have developed for historical reasons, then remained as 
such for ease of articulation. Each possibility also has its own set j)f objections. In 
addition, it is hypothetically possible that [lcp] exists rather than [pk] merely as an 
accident of language, though the universality of [Icp] makes this scenario rather dubious 
But if we assume that there is a reason (or reasons) behind the asymmetry of labial-velars, 
then it should be possible to investigate what that reason is, despite any initial difficulties. 

The experiment reported here was an attempt to test the third hypothesis More 
specifically, the hypothesis this experiment addressed was that the reason why the partial 
overlap of labial-velars is always skewed in the direction of labial release is that a labial 
release and velar onset is more perceptually salient than a velar release and labial onset. To 
test this, we spliced together sequences of [kp] and [pk], with vaiying gap durations, and 
tested to see which was more readily identifiable. 

The result of the experiment did not support this hypothesis, but showed rather 
that [pk] was the more salient of the two clusters. 



2 Chomsky and Halle (1968), in reasoning about perception of multiply-articulated 
sounds, get the phonetics precisely backwards with respect to labial-velar stops. They 
write, “The order of release of the different closures is governed by a simple rule. In 
sounds without supplementary motions [i.e. movement of the glottis during the period of 
closure- me], the releases are simultaneous. In sounds produced with supplementary 
motions, closures are released in the order of increasing distance from the lips. The reason 
for this ordering is that only in this manner will clear auditory effects be produced, for 
acoustic effects produced inside the vocal tract will be effectively suppressed if the vocal 
tract is closed.” (1968:324). This predicts labial-velars with a simple pulmonic airstream 
should release both closures simultaneously, while labial-velars with an ingressive velaric 
airstream should release the labial closure first. However, in both cases, it is the labial 
closure which is released last (see Ladefoged^l968, Painter 1970). 
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METHOD 



Spectrographic studies of labial-velar stops have shown a release burst 
characteristic of labial stops, but a transition from the preceding vowel characteristic of 
velar stops (Ladefoged 1968, Games 1975, Connell 1991, 1994, Ladefoged & Maddieson 
1996). So splicing together the consonants [k] and [p] gives a reasonable facsimile of a 
labial -velar stop. 

To produce the test sounds, the author recorded several tokens of the syllables 
[ap], [ak], [ka], and [pa] in a soundproof recording booth using a Marantz 220 tape 
recorder, with a Sure SM-48 microphone. Representatives of the appropriate tokens were 
then spliced together using the CSpeech program to form the output tokens [ap-pa], [ap- 
ka], [ak-ka], and [ak-pa]. (These stimuli will be referred to below with capitals, e g. 
APKA.) Silent durations of 0-200 ms, in 25 ms increments, were inserted between the 
offset of voicing in [aC] and the release burst of [Ca], For the [ap-ka] and [ak-pa] tokens, 
intervals were extended to 400 ms as well. A VisualBasic program was set up, so that 
listeners heard the tokens in random order over headphones, and selected either apa , aka , 
akpa , or apka as the closest to what they heard. 

1 5 listeners participated in the experiment, all undergraduate students from Ohio 
State University. All but one had English as their mother tongue and were from Ohio. The 
exception was a Jordanian student whose first language was Arabic, but her responses 
were not markedly different from the others, so they are included as well. 

The listeners were seated in a soundproof booth, with a computer screen in front 
of them. The program played the token, and the subject used the computer mouse to click 
on a button on the screen labeled apa, aka, akpa, or apka. There was a 2-second 
interval after they clicked before the next token was played. The 47 tokens were 
randomized; when one block of 47 trials finished, another block began. The same set of 
tokens was repeated in this way four times, with a different randomized order each time, 
for a total of 168 total tokens presented to each subject. The experiment was self-paced, 
with a token not presented until a response was given to the previous one. The total time 
for each run of the experiment ranged from 25-40 minutes. 

RESULTS 

Figure 2 shows the two most common responses to the AKPA stimuli. The main 
trend is that at shorter gap durations, the AKPA stimulus was perceived as “apa.” As the 
gap duration increased, the perception of “akpa” also increased. The two responses were 
approximately equal at about 80ms, as measured by the crossover point of the two plotted 
curves. As expected, virtually all the non-“akpa” responses were “apa”, having the same 
release as the stimulus; therefore, the very few responses which were “aka” or “apka” are 
not plotted. 




4 



16 






Fig. 2 Responses to AKPA input: filled ~ “akpa”, open = “apa” 

Figure 3 shows the responses to the APKA stimuli. Similar to the above, for 
shorter intervals, the “aka” response was more often given; as the gap interval increased, 
the “akpa” response was increasingly given. As expected, virtually all the non-“apka” 
responses were “aka”, having the same release as the stimulus; therefore, the very few 
responses which were “apa” or “akpa” are not plotted in Fig. 3. In comparison to the 
AKPA stimulus, the APKA stimulus was correctly identified at a shorter gap duration; the 
crossover from “aka” to “apka” occurred at only 25 ms (compared to 80 ms for AKPA). 




Fig. 3 Responses to APKA input: filled = “apka”, open = “aka” 
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Figure 4 contains the “akpa” and “apka” curves from Figures 2 and 3, showing 
directly that the “apka” response to APKA was chosen at shorter gap durations than the 
“akpa” response to AKPA. 




Gap Duration (ms) 

Fig. 4 Responses to KP vs. PK stimuli: filled = AKPA/akpa, open = APKA/apka 

A result which was unexpected was that for both the AKKA and APPA stimuli, at 
very long intervals, subjects occasionally identified the stimulus as a heterogeneous cluster 
“apka” or “akpa.” This is shown in Figs. 5-6. As above, the release consonant of the 
response was the same as the stimulus for almost all responses. The bulk of the mis- 
responses was from three subjects, but there was some scattered similar response from 
others as well. 
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Gap Duration (ms) 

Fig. 5 Responses to AKA stimuli: star = “aka”, circle = “apka” 




Gap Duration (ms) 



Fig. 6 Responses to APA stimuli: circle = “apa”, triangle = “akpa” 



DISCUSSION 

Several conclusions may be drawn from this data. 

First, at long gap durations the subjects sometimes heard a phonetic geminate as a 
cluster of heterogeneous consonants (Figs. 5-6). In English, geminate stop consonants are 
rare (the k: in bookkeeper being one example), and do not contrast with non-geminates in 
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